Take-home Exercise 3

In this Take-home Exericse, hedonic pricing models will be build to explain the factors affecting resale prcies of public housing in Singapore. The hedonic price models will be built using approriate Geographical Weighted Regression (GWR) methods.

Ngah Xin Yan https://github.com/nxinyan/
11-05-2021

Overview

When buying a house, it is generally a major investment for most people. There are many factors that affects the housing prices, such as inflation rates, economy of country, which are more global in nature. There are other factors that are specific to the properties itself also, which can be further divided to structural and locational factors. Structural factors are variables related to the property themselves such as the size, fitting, and tenure of the property. Locational factors are variables related to the neighbourhood of the properties such as proximity to childcare centre, public transport service and shopping centre.

The effect of housing factors in relation to price will be examined using hedonic pricing models. Ordinary Least Square (OLS) method is used to built the model, however it fails to take into consideration that spatial autocorrelation and spatial heterogeneity exist in geographic data sets such as housing transactions. The estimation of OLS of hedonic pricing models could lead to biased, inconsistent or inefficient results. GWR was introduced to calibrate hedonic price model for housing to due with this limitation.

Objective

To build hedonic pricing models to explain factors affecting the resale prices of public housing in Singapore. Appropriare GWR methods must be used to built these models.

Dataset Used

Installing and Loading the R packages

packages = c('olsrr', 'corrplot', 'ggpubr', 'sf', 'spdep', 'GWmodel', 'tmap', 'tidyverse', 'httr', 'sp', 'matrixStats')
for (p in packages){
  if(!require(p, character.only = T)){
    install.packages(p)
  }
  library(p,character.only = T)
}

Handling Geospatial Data

Importing Geospatial Data

childcare_sf <- st_read("data/geospatial/child-care-services-geojson.geojson")
Reading layer `child-care-services-geojson' from data source 
  `C:\nxinyan\IS415\IS415_blog-2\_posts\2021-11-05-take-home-exercise-3\data\geospatial\child-care-services-geojson.geojson' 
  using driver `GeoJSON'
Simple feature collection with 1545 features and 2 fields
Geometry type: POINT
Dimension:     XYZ
Bounding box:  xmin: 103.6824 ymin: 1.248403 xmax: 103.9897 ymax: 1.462134
z_range:       zmin: 0 zmax: 0
Geodetic CRS:  WGS 84
eldercare_sf <- st_read(dsn = "data/geospatial", layer="ELDERCARE")
Reading layer `ELDERCARE' from data source 
  `C:\nxinyan\IS415\IS415_blog-2\_posts\2021-11-05-take-home-exercise-3\data\geospatial' 
  using driver `ESRI Shapefile'
Simple feature collection with 120 features and 19 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 14481.92 ymin: 28218.43 xmax: 41665.14 ymax: 46804.9
Projected CRS: SVY21 / Singapore TM
mrtlrt_sf <- st_read(dsn = "data/geospatial", layer="MRTLRTStnPtt")
Reading layer `MRTLRTStnPtt' from data source 
  `C:\nxinyan\IS415\IS415_blog-2\_posts\2021-11-05-take-home-exercise-3\data\geospatial' 
  using driver `ESRI Shapefile'
Simple feature collection with 171 features and 3 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 6138.311 ymin: 27555.06 xmax: 45254.86 ymax: 47854.2
Projected CRS: SVY21
park_sf <- st_read("data/geospatial/parks-geojson.geojson")
Reading layer `parks-geojson' from data source 
  `C:\nxinyan\IS415\IS415_blog-2\_posts\2021-11-05-take-home-exercise-3\data\geospatial\parks-geojson.geojson' 
  using driver `GeoJSON'
Simple feature collection with 350 features and 2 fields
Geometry type: POINT
Dimension:     XYZ
Bounding box:  xmin: 103.6929 ymin: 1.214058 xmax: 104.0017 ymax: 1.461503
z_range:       zmin: 0 zmax: 0
Geodetic CRS:  WGS 84
presch_sf <- st_read("data/geospatial/pre-schools-location-geojson.geojson")
Reading layer `pre-schools-location-geojson' from data source 
  `C:\nxinyan\IS415\IS415_blog-2\_posts\2021-11-05-take-home-exercise-3\data\geospatial\pre-schools-location-geojson.geojson' 
  using driver `GeoJSON'
Simple feature collection with 1925 features and 2 fields
Geometry type: POINT
Dimension:     XYZ
Bounding box:  xmin: 103.6824 ymin: 1.247759 xmax: 103.9897 ymax: 1.462134
z_range:       zmin: 0 zmax: 0
Geodetic CRS:  WGS 84
supermarket_sf <- st_read("data/geospatial/supermarkets-geojson.geojson")
Reading layer `supermarkets-geojson' from data source 
  `C:\nxinyan\IS415\IS415_blog-2\_posts\2021-11-05-take-home-exercise-3\data\geospatial\supermarkets-geojson.geojson' 
  using driver `GeoJSON'
Simple feature collection with 526 features and 2 fields
Geometry type: POINT
Dimension:     XYZ
Bounding box:  xmin: 103.6258 ymin: 1.24715 xmax: 104.0036 ymax: 1.461526
z_range:       zmin: 0 zmax: 0
Geodetic CRS:  WGS 84
mpsz_sf <- st_read(dsn="data/geospatial",
               layer="MP14_SUBZONE_WEB_PL")
Reading layer `MP14_SUBZONE_WEB_PL' from data source 
  `C:\nxinyan\IS415\IS415_blog-2\_posts\2021-11-05-take-home-exercise-3\data\geospatial' 
  using driver `ESRI Shapefile'
Simple feature collection with 323 features and 15 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 2667.538 ymin: 15748.72 xmax: 56396.44 ymax: 50256.33
Projected CRS: SVY21

Checking for invalid geometries

length(which(st_is_valid(childcare_sf) == FALSE))
[1] 0
length(which(st_is_valid(eldercare_sf) == FALSE))
[1] 0
length(which(st_is_valid(mrtlrt_sf) == FALSE))
[1] 0
length(which(st_is_valid(park_sf) == FALSE))
[1] 0
length(which(st_is_valid(presch_sf) == FALSE))
[1] 0
length(which(st_is_valid(supermarket_sf) == FALSE))
[1] 0
length(which(st_is_valid(mpsz_sf) == FALSE))
[1] 9

No invalid geometries is observed in childcare_sf, eldercare_sf, mrtlrt_sf, presch_sf, park_sf and supermarket_sf data frames. However, mpsz_sf data frame consists of 9 invalid geometries. The invalid geometries will be removed in the mpsz_sf data frame using st_make_valid() and check for any invalid geometries again.

mpsz_sf <- st_make_valid(mpsz_sf)
length(which(st_is_valid(mpsz_sf) == FALSE))
[1] 0

Checking for missing values

childcare_sf[rowSums(is.na(childcare_sf))!=0,]
Simple feature collection with 0 features and 2 fields
Bounding box:  xmin: NA ymin: NA xmax: NA ymax: NA
Geodetic CRS:  WGS 84
[1] Name        Description geometry   
<0 rows> (or 0-length row.names)
eldercare_sf[rowSums(is.na(eldercare_sf))!=0,]
Simple feature collection with 120 features and 19 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 14481.92 ymin: 28218.43 xmax: 41665.14 ymax: 46804.9
Projected CRS: SVY21 / Singapore TM
First 10 features:
   fid OBJECTID ADDRESSBLO ADDRESSBUI ADDRESSPOS
1    1        1       <NA>       <NA>     601318
2    2        2       <NA>       <NA>     462509
3    3        3       <NA>       <NA>     640190
4    4        4       <NA>       <NA>     190005
5    5        5       <NA>       <NA>     160044
6    6        6       <NA>       <NA>     160117
7    8        8       <NA>       <NA>     731569
8    9        9       <NA>       <NA>     651210
9   10       10       <NA>       <NA>     540182
10  11       11       <NA>       <NA>     228080
                              ADDRESSSTR ADDRESSTYP DESCRIPTIO
1      318A Jurong East Avenue 1 #02-308       <NA>       <NA>
2  Blk 509B Bedok North Street 3 #02-157       <NA>       <NA>
3         Blk 190 Boon Lay Drive #01-242       <NA>       <NA>
4                    5 Beach Rd #02-4943       <NA>       <NA>
5             Blk 44 Beo Crescent #01-67       <NA>       <NA>
6     Blk 117 Jalan Bukit Merah #01-1683       <NA>       <NA>
7              569A Champion Way #01-346       <NA>       <NA>
8         210A Bukit Batok St 21 #01-294       <NA>       <NA>
9    Blk 182 Rivervale Crescent\n#01-311       <NA>       <NA>
10                        81 Wilkie Road       <NA>       <NA>
   HYPERLINK LANDXADDRE LANDYADDRE
1       <NA>          0          0
2       <NA>          0          0
3       <NA>          0          0
4       <NA>          0          0
5       <NA>          0          0
6       <NA>          0          0
7       <NA>          0          0
8       <NA>          0          0
9       <NA>          0          0
10      <NA>          0          0
                                                     NAME PHOTOURL
1                            Yuhua Senior Activity Centre     <NA>
2                                    THK SAC @ Kaki Bukit     <NA>
3                                      THK SAC @ Boon Lay     <NA>
4                  PEACE-Connect Senior Activity Centre@5     <NA>
5                                  THK SAC @ Beo Crescent     <NA>
6                                Silver ACE @ Bukit Merah     <NA>
7              Care Corner Senior Activity Centre (WL569)     <NA>
8     Fei Yue Senior Activity Centre (Bukit Batok Branch)     <NA>
9  COMNET Senior Activity Centre @ 182 Rivervale Crescent     <NA>
10                   Abdullah Shooker Jewish Welfare Home     <NA>
   ADDRESSFLO          INC_CRC FMEL_UPD_D ADDRESSUNI   X_ADDR
1        <NA> 2B0DB92FDD914FFC 2016-07-28       <NA> 16614.08
2        <NA> 82728FA30612F3FD 2016-07-28       <NA> 38803.81
3        <NA> DE7A8D4EA0BD1D9B 2016-07-28       <NA> 14481.92
4        <NA> A2C058FC5785F7FE 2016-07-28       <NA> 31505.35
5        <NA> 9DBFD51E056AEE70 2016-07-28       <NA> 27218.35
6        <NA> 169DABA5B6ECEA87 2016-07-28       <NA> 27278.94
7        <NA> 4DC6800E9BB385BE 2016-07-28       <NA> 23147.94
8        <NA> EFBD712DA5DD6FEC 2016-07-28       <NA> 18820.58
9        <NA> 6BB0D7698D7B4C7D 2016-07-28       <NA> 36446.37
10       <NA> EBA4B3E6F7F5B8F0 2016-07-28       <NA> 29544.78
     Y_ADDR                  geometry
1  36639.12 POINT (16614.08 36639.12)
2  35098.78 POINT (38803.81 35098.78)
3  36357.61 POINT (14481.92 36357.61)
4  31853.52 POINT (31505.35 31853.52)
5  30135.49 POINT (27218.35 30135.49)
6  29350.17 POINT (27278.94 29350.17)
7  45761.17 POINT (23147.94 45761.17)
8  36396.32 POINT (18820.58 36396.32)
9  41376.90  POINT (36446.37 41376.9)
10 31691.08 POINT (29544.78 31691.08)
mrtlrt_sf[rowSums(is.na(mrtlrt_sf))!=0,]
Simple feature collection with 0 features and 3 fields
Bounding box:  xmin: NA ymin: NA xmax: NA ymax: NA
Projected CRS: SVY21
[1] OBJECTID STN_NAME STN_NO   geometry
<0 rows> (or 0-length row.names)
park_sf[rowSums(is.na(park_sf))!=0,]
Simple feature collection with 0 features and 2 fields
Bounding box:  xmin: NA ymin: NA xmax: NA ymax: NA
Geodetic CRS:  WGS 84
[1] Name        Description geometry   
<0 rows> (or 0-length row.names)
presch_sf[rowSums(is.na(presch_sf))!=0,]
Simple feature collection with 0 features and 2 fields
Bounding box:  xmin: NA ymin: NA xmax: NA ymax: NA
Geodetic CRS:  WGS 84
[1] Name        Description geometry   
<0 rows> (or 0-length row.names)
supermarket_sf[rowSums(is.na(supermarket_sf))!=0,]
Simple feature collection with 0 features and 2 fields
Bounding box:  xmin: NA ymin: NA xmax: NA ymax: NA
Geodetic CRS:  WGS 84
[1] Name        Description geometry   
<0 rows> (or 0-length row.names)
mpsz_sf[rowSums(is.na(mpsz_sf))!=0,]
Simple feature collection with 0 features and 15 fields
Bounding box:  xmin: NA ymin: NA xmax: NA ymax: NA
Projected CRS: SVY21
 [1] OBJECTID   SUBZONE_NO SUBZONE_N  SUBZONE_C  CA_IND     PLN_AREA_N
 [7] PLN_AREA_C REGION_N   REGION_C   INC_CRC    FMEL_UPD_D X_ADDR    
[13] Y_ADDR     SHAPE_Leng SHAPE_Area geometry  
<0 rows> (or 0-length row.names)

Only eldercare_sf contains missing values. Columns with missing values are removed.

eldercare_sf = select(eldercare_sf, -3, -4, -7,-8, -9, -13, -14, -17)

Checking for missing values again

eldercare_sf[rowSums(is.na(eldercare_sf))!=0,]
Simple feature collection with 0 features and 11 fields
Bounding box:  xmin: NA ymin: NA xmax: NA ymax: NA
Projected CRS: SVY21 / Singapore TM
 [1] fid        OBJECTID   ADDRESSPOS ADDRESSSTR LANDXADDRE LANDYADDRE
 [7] NAME       INC_CRC    FMEL_UPD_D X_ADDR     Y_ADDR     geometry  
<0 rows> (or 0-length row.names)

Checking for duplicates

any(duplicated(childcare_sf))
[1] FALSE
any(duplicated(eldercare_sf))
[1] FALSE
any(duplicated(mrtlrt_sf))
[1] FALSE
any(duplicated(park_sf))
[1] FALSE
any(duplicated(presch_sf))
[1] FALSE
any(duplicated(supermarket_sf))
[1] FALSE
any(duplicated(mpsz_sf))
[1] FALSE

There is no duplicated in the data sets above.

Checking of coordinate reference system (CRS)

st_crs() of sf package is used to display the coordinate reference system information.

st_crs(childcare_sf)
Coordinate Reference System:
  User input: WGS 84 
  wkt:
GEOGCRS["WGS 84",
    DATUM["World Geodetic System 1984",
        ELLIPSOID["WGS 84",6378137,298.257223563,
            LENGTHUNIT["metre",1]]],
    PRIMEM["Greenwich",0,
        ANGLEUNIT["degree",0.0174532925199433]],
    CS[ellipsoidal,2],
        AXIS["geodetic latitude (Lat)",north,
            ORDER[1],
            ANGLEUNIT["degree",0.0174532925199433]],
        AXIS["geodetic longitude (Lon)",east,
            ORDER[2],
            ANGLEUNIT["degree",0.0174532925199433]],
    ID["EPSG",4326]]
st_crs(eldercare_sf)
Coordinate Reference System:
  User input: SVY21 / Singapore TM 
  wkt:
PROJCRS["SVY21 / Singapore TM",
    BASEGEOGCRS["SVY21",
        DATUM["SVY21",
            ELLIPSOID["WGS 84",6378137,298.257223563,
                LENGTHUNIT["metre",1]]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["degree",0.0174532925199433]],
        ID["EPSG",4757]],
    CONVERSION["Singapore Transverse Mercator",
        METHOD["Transverse Mercator",
            ID["EPSG",9807]],
        PARAMETER["Latitude of natural origin",1.36666666666667,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8801]],
        PARAMETER["Longitude of natural origin",103.833333333333,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8802]],
        PARAMETER["Scale factor at natural origin",1,
            SCALEUNIT["unity",1],
            ID["EPSG",8805]],
        PARAMETER["False easting",28001.642,
            LENGTHUNIT["metre",1],
            ID["EPSG",8806]],
        PARAMETER["False northing",38744.572,
            LENGTHUNIT["metre",1],
            ID["EPSG",8807]]],
    CS[Cartesian,2],
        AXIS["northing (N)",north,
            ORDER[1],
            LENGTHUNIT["metre",1]],
        AXIS["easting (E)",east,
            ORDER[2],
            LENGTHUNIT["metre",1]],
    USAGE[
        SCOPE["Cadastre, engineering survey, topographic mapping."],
        AREA["Singapore - onshore and offshore."],
        BBOX[1.13,103.59,1.47,104.07]],
    ID["EPSG",3414]]
st_crs(mrtlrt_sf)
Coordinate Reference System:
  User input: SVY21 
  wkt:
PROJCRS["SVY21",
    BASEGEOGCRS["SVY21[WGS84]",
        DATUM["World Geodetic System 1984",
            ELLIPSOID["WGS 84",6378137,298.257223563,
                LENGTHUNIT["metre",1]],
            ID["EPSG",6326]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["Degree",0.0174532925199433]]],
    CONVERSION["unnamed",
        METHOD["Transverse Mercator",
            ID["EPSG",9807]],
        PARAMETER["Latitude of natural origin",1.36666666666667,
            ANGLEUNIT["Degree",0.0174532925199433],
            ID["EPSG",8801]],
        PARAMETER["Longitude of natural origin",103.833333333333,
            ANGLEUNIT["Degree",0.0174532925199433],
            ID["EPSG",8802]],
        PARAMETER["Scale factor at natural origin",1,
            SCALEUNIT["unity",1],
            ID["EPSG",8805]],
        PARAMETER["False easting",28001.642,
            LENGTHUNIT["metre",1],
            ID["EPSG",8806]],
        PARAMETER["False northing",38744.572,
            LENGTHUNIT["metre",1],
            ID["EPSG",8807]]],
    CS[Cartesian,2],
        AXIS["(E)",east,
            ORDER[1],
            LENGTHUNIT["metre",1,
                ID["EPSG",9001]]],
        AXIS["(N)",north,
            ORDER[2],
            LENGTHUNIT["metre",1,
                ID["EPSG",9001]]]]
st_crs(park_sf)
Coordinate Reference System:
  User input: WGS 84 
  wkt:
GEOGCRS["WGS 84",
    DATUM["World Geodetic System 1984",
        ELLIPSOID["WGS 84",6378137,298.257223563,
            LENGTHUNIT["metre",1]]],
    PRIMEM["Greenwich",0,
        ANGLEUNIT["degree",0.0174532925199433]],
    CS[ellipsoidal,2],
        AXIS["geodetic latitude (Lat)",north,
            ORDER[1],
            ANGLEUNIT["degree",0.0174532925199433]],
        AXIS["geodetic longitude (Lon)",east,
            ORDER[2],
            ANGLEUNIT["degree",0.0174532925199433]],
    ID["EPSG",4326]]
st_crs(presch_sf)
Coordinate Reference System:
  User input: WGS 84 
  wkt:
GEOGCRS["WGS 84",
    DATUM["World Geodetic System 1984",
        ELLIPSOID["WGS 84",6378137,298.257223563,
            LENGTHUNIT["metre",1]]],
    PRIMEM["Greenwich",0,
        ANGLEUNIT["degree",0.0174532925199433]],
    CS[ellipsoidal,2],
        AXIS["geodetic latitude (Lat)",north,
            ORDER[1],
            ANGLEUNIT["degree",0.0174532925199433]],
        AXIS["geodetic longitude (Lon)",east,
            ORDER[2],
            ANGLEUNIT["degree",0.0174532925199433]],
    ID["EPSG",4326]]
st_crs(supermarket_sf)
Coordinate Reference System:
  User input: WGS 84 
  wkt:
GEOGCRS["WGS 84",
    DATUM["World Geodetic System 1984",
        ELLIPSOID["WGS 84",6378137,298.257223563,
            LENGTHUNIT["metre",1]]],
    PRIMEM["Greenwich",0,
        ANGLEUNIT["degree",0.0174532925199433]],
    CS[ellipsoidal,2],
        AXIS["geodetic latitude (Lat)",north,
            ORDER[1],
            ANGLEUNIT["degree",0.0174532925199433]],
        AXIS["geodetic longitude (Lon)",east,
            ORDER[2],
            ANGLEUNIT["degree",0.0174532925199433]],
    ID["EPSG",4326]]
st_crs(mpsz_sf)
Coordinate Reference System:
  User input: SVY21 
  wkt:
PROJCRS["SVY21",
    BASEGEOGCRS["SVY21[WGS84]",
        DATUM["World Geodetic System 1984",
            ELLIPSOID["WGS 84",6378137,298.257223563,
                LENGTHUNIT["metre",1]],
            ID["EPSG",6326]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["Degree",0.0174532925199433]]],
    CONVERSION["unnamed",
        METHOD["Transverse Mercator",
            ID["EPSG",9807]],
        PARAMETER["Latitude of natural origin",1.36666666666667,
            ANGLEUNIT["Degree",0.0174532925199433],
            ID["EPSG",8801]],
        PARAMETER["Longitude of natural origin",103.833333333333,
            ANGLEUNIT["Degree",0.0174532925199433],
            ID["EPSG",8802]],
        PARAMETER["Scale factor at natural origin",1,
            SCALEUNIT["unity",1],
            ID["EPSG",8805]],
        PARAMETER["False easting",28001.642,
            LENGTHUNIT["metre",1],
            ID["EPSG",8806]],
        PARAMETER["False northing",38744.572,
            LENGTHUNIT["metre",1],
            ID["EPSG",8807]]],
    CS[Cartesian,2],
        AXIS["(E)",east,
            ORDER[1],
            LENGTHUNIT["metre",1,
                ID["EPSG",9001]]],
        AXIS["(N)",north,
            ORDER[2],
            LENGTHUNIT["metre",1,
                ID["EPSG",9001]]]]

From the output above, there are datasets that has EPSG code of 4326. and 9001 The correct coordinate system code for SVY21 is 3414 instead of 4326 and 9001. st_transform() of sf package will be used to assign the correct EPSG code to all data frames.

childcare_sf <- st_transform(childcare_sf, 3414)
eldercare_sf <- st_transform(eldercare_sf, 3414)
mrtlrt_sf <- st_transform(mrtlrt_sf, 3414)
park_sf <- st_transform(park_sf, 3414)
presch_sf <- st_transform(presch_sf, 3414)
supermarket_sf <- st_transform(supermarket_sf, 3414)
mpsz_sf <- st_transform(mpsz_sf, 3414)

Checking CRS again

st_crs(childcare_sf)
Coordinate Reference System:
  User input: EPSG:3414 
  wkt:
PROJCRS["SVY21 / Singapore TM",
    BASEGEOGCRS["SVY21",
        DATUM["SVY21",
            ELLIPSOID["WGS 84",6378137,298.257223563,
                LENGTHUNIT["metre",1]]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["degree",0.0174532925199433]],
        ID["EPSG",4757]],
    CONVERSION["Singapore Transverse Mercator",
        METHOD["Transverse Mercator",
            ID["EPSG",9807]],
        PARAMETER["Latitude of natural origin",1.36666666666667,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8801]],
        PARAMETER["Longitude of natural origin",103.833333333333,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8802]],
        PARAMETER["Scale factor at natural origin",1,
            SCALEUNIT["unity",1],
            ID["EPSG",8805]],
        PARAMETER["False easting",28001.642,
            LENGTHUNIT["metre",1],
            ID["EPSG",8806]],
        PARAMETER["False northing",38744.572,
            LENGTHUNIT["metre",1],
            ID["EPSG",8807]]],
    CS[Cartesian,2],
        AXIS["northing (N)",north,
            ORDER[1],
            LENGTHUNIT["metre",1]],
        AXIS["easting (E)",east,
            ORDER[2],
            LENGTHUNIT["metre",1]],
    USAGE[
        SCOPE["Cadastre, engineering survey, topographic mapping."],
        AREA["Singapore - onshore and offshore."],
        BBOX[1.13,103.59,1.47,104.07]],
    ID["EPSG",3414]]
st_crs(eldercare_sf)
Coordinate Reference System:
  User input: EPSG:3414 
  wkt:
PROJCRS["SVY21 / Singapore TM",
    BASEGEOGCRS["SVY21",
        DATUM["SVY21",
            ELLIPSOID["WGS 84",6378137,298.257223563,
                LENGTHUNIT["metre",1]]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["degree",0.0174532925199433]],
        ID["EPSG",4757]],
    CONVERSION["Singapore Transverse Mercator",
        METHOD["Transverse Mercator",
            ID["EPSG",9807]],
        PARAMETER["Latitude of natural origin",1.36666666666667,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8801]],
        PARAMETER["Longitude of natural origin",103.833333333333,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8802]],
        PARAMETER["Scale factor at natural origin",1,
            SCALEUNIT["unity",1],
            ID["EPSG",8805]],
        PARAMETER["False easting",28001.642,
            LENGTHUNIT["metre",1],
            ID["EPSG",8806]],
        PARAMETER["False northing",38744.572,
            LENGTHUNIT["metre",1],
            ID["EPSG",8807]]],
    CS[Cartesian,2],
        AXIS["northing (N)",north,
            ORDER[1],
            LENGTHUNIT["metre",1]],
        AXIS["easting (E)",east,
            ORDER[2],
            LENGTHUNIT["metre",1]],
    USAGE[
        SCOPE["Cadastre, engineering survey, topographic mapping."],
        AREA["Singapore - onshore and offshore."],
        BBOX[1.13,103.59,1.47,104.07]],
    ID["EPSG",3414]]
st_crs(mrtlrt_sf)
Coordinate Reference System:
  User input: EPSG:3414 
  wkt:
PROJCRS["SVY21 / Singapore TM",
    BASEGEOGCRS["SVY21",
        DATUM["SVY21",
            ELLIPSOID["WGS 84",6378137,298.257223563,
                LENGTHUNIT["metre",1]]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["degree",0.0174532925199433]],
        ID["EPSG",4757]],
    CONVERSION["Singapore Transverse Mercator",
        METHOD["Transverse Mercator",
            ID["EPSG",9807]],
        PARAMETER["Latitude of natural origin",1.36666666666667,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8801]],
        PARAMETER["Longitude of natural origin",103.833333333333,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8802]],
        PARAMETER["Scale factor at natural origin",1,
            SCALEUNIT["unity",1],
            ID["EPSG",8805]],
        PARAMETER["False easting",28001.642,
            LENGTHUNIT["metre",1],
            ID["EPSG",8806]],
        PARAMETER["False northing",38744.572,
            LENGTHUNIT["metre",1],
            ID["EPSG",8807]]],
    CS[Cartesian,2],
        AXIS["northing (N)",north,
            ORDER[1],
            LENGTHUNIT["metre",1]],
        AXIS["easting (E)",east,
            ORDER[2],
            LENGTHUNIT["metre",1]],
    USAGE[
        SCOPE["Cadastre, engineering survey, topographic mapping."],
        AREA["Singapore - onshore and offshore."],
        BBOX[1.13,103.59,1.47,104.07]],
    ID["EPSG",3414]]
st_crs(park_sf)
Coordinate Reference System:
  User input: EPSG:3414 
  wkt:
PROJCRS["SVY21 / Singapore TM",
    BASEGEOGCRS["SVY21",
        DATUM["SVY21",
            ELLIPSOID["WGS 84",6378137,298.257223563,
                LENGTHUNIT["metre",1]]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["degree",0.0174532925199433]],
        ID["EPSG",4757]],
    CONVERSION["Singapore Transverse Mercator",
        METHOD["Transverse Mercator",
            ID["EPSG",9807]],
        PARAMETER["Latitude of natural origin",1.36666666666667,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8801]],
        PARAMETER["Longitude of natural origin",103.833333333333,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8802]],
        PARAMETER["Scale factor at natural origin",1,
            SCALEUNIT["unity",1],
            ID["EPSG",8805]],
        PARAMETER["False easting",28001.642,
            LENGTHUNIT["metre",1],
            ID["EPSG",8806]],
        PARAMETER["False northing",38744.572,
            LENGTHUNIT["metre",1],
            ID["EPSG",8807]]],
    CS[Cartesian,2],
        AXIS["northing (N)",north,
            ORDER[1],
            LENGTHUNIT["metre",1]],
        AXIS["easting (E)",east,
            ORDER[2],
            LENGTHUNIT["metre",1]],
    USAGE[
        SCOPE["Cadastre, engineering survey, topographic mapping."],
        AREA["Singapore - onshore and offshore."],
        BBOX[1.13,103.59,1.47,104.07]],
    ID["EPSG",3414]]
st_crs(presch_sf)
Coordinate Reference System:
  User input: EPSG:3414 
  wkt:
PROJCRS["SVY21 / Singapore TM",
    BASEGEOGCRS["SVY21",
        DATUM["SVY21",
            ELLIPSOID["WGS 84",6378137,298.257223563,
                LENGTHUNIT["metre",1]]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["degree",0.0174532925199433]],
        ID["EPSG",4757]],
    CONVERSION["Singapore Transverse Mercator",
        METHOD["Transverse Mercator",
            ID["EPSG",9807]],
        PARAMETER["Latitude of natural origin",1.36666666666667,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8801]],
        PARAMETER["Longitude of natural origin",103.833333333333,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8802]],
        PARAMETER["Scale factor at natural origin",1,
            SCALEUNIT["unity",1],
            ID["EPSG",8805]],
        PARAMETER["False easting",28001.642,
            LENGTHUNIT["metre",1],
            ID["EPSG",8806]],
        PARAMETER["False northing",38744.572,
            LENGTHUNIT["metre",1],
            ID["EPSG",8807]]],
    CS[Cartesian,2],
        AXIS["northing (N)",north,
            ORDER[1],
            LENGTHUNIT["metre",1]],
        AXIS["easting (E)",east,
            ORDER[2],
            LENGTHUNIT["metre",1]],
    USAGE[
        SCOPE["Cadastre, engineering survey, topographic mapping."],
        AREA["Singapore - onshore and offshore."],
        BBOX[1.13,103.59,1.47,104.07]],
    ID["EPSG",3414]]
st_crs(supermarket_sf)
Coordinate Reference System:
  User input: EPSG:3414 
  wkt:
PROJCRS["SVY21 / Singapore TM",
    BASEGEOGCRS["SVY21",
        DATUM["SVY21",
            ELLIPSOID["WGS 84",6378137,298.257223563,
                LENGTHUNIT["metre",1]]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["degree",0.0174532925199433]],
        ID["EPSG",4757]],
    CONVERSION["Singapore Transverse Mercator",
        METHOD["Transverse Mercator",
            ID["EPSG",9807]],
        PARAMETER["Latitude of natural origin",1.36666666666667,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8801]],
        PARAMETER["Longitude of natural origin",103.833333333333,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8802]],
        PARAMETER["Scale factor at natural origin",1,
            SCALEUNIT["unity",1],
            ID["EPSG",8805]],
        PARAMETER["False easting",28001.642,
            LENGTHUNIT["metre",1],
            ID["EPSG",8806]],
        PARAMETER["False northing",38744.572,
            LENGTHUNIT["metre",1],
            ID["EPSG",8807]]],
    CS[Cartesian,2],
        AXIS["northing (N)",north,
            ORDER[1],
            LENGTHUNIT["metre",1]],
        AXIS["easting (E)",east,
            ORDER[2],
            LENGTHUNIT["metre",1]],
    USAGE[
        SCOPE["Cadastre, engineering survey, topographic mapping."],
        AREA["Singapore - onshore and offshore."],
        BBOX[1.13,103.59,1.47,104.07]],
    ID["EPSG",3414]]
st_crs(mpsz_sf)
Coordinate Reference System:
  User input: EPSG:3414 
  wkt:
PROJCRS["SVY21 / Singapore TM",
    BASEGEOGCRS["SVY21",
        DATUM["SVY21",
            ELLIPSOID["WGS 84",6378137,298.257223563,
                LENGTHUNIT["metre",1]]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["degree",0.0174532925199433]],
        ID["EPSG",4757]],
    CONVERSION["Singapore Transverse Mercator",
        METHOD["Transverse Mercator",
            ID["EPSG",9807]],
        PARAMETER["Latitude of natural origin",1.36666666666667,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8801]],
        PARAMETER["Longitude of natural origin",103.833333333333,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8802]],
        PARAMETER["Scale factor at natural origin",1,
            SCALEUNIT["unity",1],
            ID["EPSG",8805]],
        PARAMETER["False easting",28001.642,
            LENGTHUNIT["metre",1],
            ID["EPSG",8806]],
        PARAMETER["False northing",38744.572,
            LENGTHUNIT["metre",1],
            ID["EPSG",8807]]],
    CS[Cartesian,2],
        AXIS["northing (N)",north,
            ORDER[1],
            LENGTHUNIT["metre",1]],
        AXIS["easting (E)",east,
            ORDER[2],
            LENGTHUNIT["metre",1]],
    USAGE[
        SCOPE["Cadastre, engineering survey, topographic mapping."],
        AREA["Singapore - onshore and offshore."],
        BBOX[1.13,103.59,1.47,104.07]],
    ID["EPSG",3414]]

All dataframes has the EPSG code of 3414 now.

Displaying basic information of the data frames

st_geometry(childcare_sf)
Geometry set for 1545 features 
Geometry type: POINT
Dimension:     XYZ
Bounding box:  xmin: 11203.01 ymin: 25667.6 xmax: 45404.24 ymax: 49300.88
z_range:       zmin: 0 zmax: 0
Projected CRS: SVY21 / Singapore TM
First 5 geometries:
st_geometry(eldercare_sf)
Geometry set for 120 features 
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 14481.92 ymin: 28218.43 xmax: 41665.14 ymax: 46804.9
Projected CRS: SVY21 / Singapore TM
First 5 geometries:
st_geometry(mrtlrt_sf)
Geometry set for 171 features 
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 6138.311 ymin: 27555.06 xmax: 45254.86 ymax: 47854.2
Projected CRS: SVY21 / Singapore TM
First 5 geometries:
st_geometry(park_sf)
Geometry set for 350 features 
Geometry type: POINT
Dimension:     XYZ
Bounding box:  xmin: 12373.11 ymin: 21869.93 xmax: 46735.95 ymax: 49231.09
z_range:       zmin: 0 zmax: 0
Projected CRS: SVY21 / Singapore TM
First 5 geometries:
st_geometry(presch_sf)
Geometry set for 1925 features 
Geometry type: POINT
Dimension:     XYZ
Bounding box:  xmin: 11203.01 ymin: 25596.33 xmax: 45404.24 ymax: 49300.88
z_range:       zmin: 0 zmax: 0
Projected CRS: SVY21 / Singapore TM
First 5 geometries:
st_geometry(supermarket_sf)
Geometry set for 526 features 
Geometry type: POINT
Dimension:     XYZ
Bounding box:  xmin: 4901.188 ymin: 25529.08 xmax: 46948.22 ymax: 49233.6
z_range:       zmin: 0 zmax: 0
Projected CRS: SVY21 / Singapore TM
First 5 geometries:
st_geometry(mpsz_sf)
Geometry set for 323 features 
Geometry type: GEOMETRY
Dimension:     XY
Bounding box:  xmin: 2667.538 ymin: 15748.72 xmax: 56396.44 ymax: 50256.33
Projected CRS: SVY21 / Singapore TM
First 5 geometries:

All geometry type are points except for mpsz_sf which

Visualisation of Geospatial Data

For childcare and eldercare

tmap_mode('view')
tm_shape(mpsz_sf) +
  tm_polygons() +
tm_shape(childcare_sf) +
  tm_dots(col="red") +
tm_shape(eldercare_sf) + 
  tm_dots(col="blue") +
  tm_view(set.zoom.limits = c(11,14))
tmap_mode('plot')

For mrtlrt and park

tmap_mode('view')
tm_shape(mpsz_sf) +
  tm_polygons() +
tm_shape(mrtlrt_sf) +
  tm_dots(col="red") +
tm_shape(park_sf) + 
  tm_dots(col="blue") +
  tm_view(set.zoom.limits = c(11,14))
tmap_mode('plot')

For pre school and supermarket

tmap_mode('view')
tm_shape(mpsz_sf) +
  tm_polygons() +
tm_shape(presch_sf) +
  tm_dots(col="red") +
tm_shape(supermarket_sf) + 
  tm_dots(col="blue") +
  tm_view(set.zoom.limits = c(11,14))
tmap_mode('plot')

Handling Aspatial Data

Importing Aspatial Data

Importing the dataset into R dataframe by using read_csv() function of readr package in tidyverse:

hdb_resale <- read_csv("data/aspatial/resale-flat-prices-based-on-registration-date-from-jan-2017-onwards.csv")

Use glimpse() to learn more about the attribute information in the data frame.

glimpse(hdb_resale)

For this Take-Home Exercise, we will only focus on four-room flats transacted from 1st January 2019 to 30th September 2020. Thus, the data set will be filtered according to the focus.

Filtering data frame for relevant data

filter() function of dplyr package in tidyverse is used to filter the data of 4-room flats from 1st January 2019 to 30th September 2020

mutate() function of dplyr package in tidyverse is used to add two new columns to hdbresale_jan19sep20: address (consisting of block and street name) and remaininglease_years (remaining lease in terms of years only)

hdb_resale_jan19sep20 <- hdb_resale %>%
                filter(month %in% c("2019-01", "2019-02", "2019-03", "2019-04", "2019-05", "2019-06", "2019-07", "2019-08", "2019-09", "2019-10", "2019-11", "2019-12","2020-01", "2020-02", "2020-03", "2020-04", "2020-05", "2020-06", "2020-07", "2020-08", "2020-09")) %>%
                filter(flat_type == "4 ROOM")

remaining_lease_list <- strsplit(hdb_resale_jan19sep20$remaining_lease, " ")
hdb_resale_jan19sep20$remaininglease_years <- 0
for (x in 1:length(remaining_lease_list)) {
  if (length(unlist(remaining_lease_list[x])) > 2) {
      year <- as.numeric(unlist(remaining_lease_list[x])[1])
      month <- as.numeric(unlist(remaining_lease_list[x])[3])
      hdb_resale_jan19sep20$remaininglease_years[x] <- year + round(month/12, 2)
  }
  else {
    year <- as.numeric(unlist(remaining_lease_list[x])[1])
    hdb_resale_jan19sep20$remaininglease_years[x] <- year
  }
}

hdb_resale_data <- hdb_resale_jan19sep20 %>%
                  mutate(hdb_resale_jan19sep20, address = paste(block, street_name)) %>%
                  select(month, town, flat_type, address, block, street_name, storey_range, floor_area_sqm, flat_model, lease_commence_date, remaining_lease, remaininglease_years, resale_price)
hdb_resale_data
# A tibble: 15,901 x 13
   month   town       flat_type address block street_name storey_range
   <chr>   <chr>      <chr>     <chr>   <chr> <chr>       <chr>       
 1 2019-01 ANG MO KIO 4 ROOM    204 AN~ 204   ANG MO KIO~ 01 TO 03    
 2 2019-01 ANG MO KIO 4 ROOM    175 AN~ 175   ANG MO KIO~ 07 TO 09    
 3 2019-01 ANG MO KIO 4 ROOM    543 AN~ 543   ANG MO KIO~ 01 TO 03    
 4 2019-01 ANG MO KIO 4 ROOM    118 AN~ 118   ANG MO KIO~ 04 TO 06    
 5 2019-01 ANG MO KIO 4 ROOM    411 AN~ 411   ANG MO KIO~ 04 TO 06    
 6 2019-01 ANG MO KIO 4 ROOM    546 AN~ 546   ANG MO KIO~ 10 TO 12    
 7 2019-01 ANG MO KIO 4 ROOM    428 AN~ 428   ANG MO KIO~ 07 TO 09    
 8 2019-01 ANG MO KIO 4 ROOM    173 AN~ 173   ANG MO KIO~ 04 TO 06    
 9 2019-01 ANG MO KIO 4 ROOM    607 AN~ 607   ANG MO KIO~ 10 TO 12    
10 2019-01 ANG MO KIO 4 ROOM    603 AN~ 603   ANG MO KIO~ 07 TO 09    
# ... with 15,891 more rows, and 6 more variables:
#   floor_area_sqm <dbl>, flat_model <chr>,
#   lease_commence_date <dbl>, remaining_lease <chr>,
#   remaininglease_years <dbl>, resale_price <dbl>

There is a total of 15901 4-room flat transactions from 1st January 2019 to 30 September 2020 after filtering.

It is noticed that street name such as “ST. GEORGE’S” has problem when obtaining the latitude and longitude in the later part. The short form “ST.” is replaced with its full name “SAINT”.

hdb_resale_data$address <- sub('ST\\.','SAINT',hdb_resale_data$address)

Getting coordinates (latitude and longitude) for each 4-room flat

To get the coordinates for each 4-room flat, the address column in hdb_resale_data is used in OneMap Search API which returns search results with both latitude and longitude. Obtain the result using GET() function of httr package and add the values of the new columns, LATITUDE and LONGITUDE respectively into each row of hdb_resale_data.

for (x in 1:nrow(hdb_resale_data)) {
  address <- hdb_resale_data[x,'address']
  
  url = paste('https://developers.onemap.sg/commonapi/search?searchVal=', address, '&returnGeom=Y&getAddrDetails=Y&pageNum=1')
  latlong_url <- URLencode(url)
  
  latlong <- GET(latlong_url)
  
  jsonlatlong <- content(latlong,as="parsed")
  
  if (length(jsonlatlong$results) > 0) {
    hdb_resale_data[x,'LATITUDE'] = jsonlatlong$results[[1]]$LATITUDE
    hdb_resale_data[x,'LONGITUDE'] = jsonlatlong$results[[1]]$LONGITUDE
  }
}

Checking the rows for any missing data in the LATITUDE and LONGITUDE column

is.na() is used to check for the total number of missing value (NA)

sum(is.na(hdb_resale_data$LATITUDE))
[1] 0
sum(is.na(hdb_resale_data$LONGITUDE))
[1] 0

There is no missing data observed.

Creating a csv to keep the results above

To avoid going through the long process above again, data will be exported to the new csv.

write.csv(hdb_resale_data, "data/aspatial/hdb_resale_latlong.csv", row.names = FALSE)

Structural Factors

Importing the updated csv file

resale_data <- read_csv("data/aspatial/hdb_resale_latlong.csv")

Remaining Lease

Obtain from the column remaininglease_years which was converted from remaining_lease into years as a unit instead of years and months.

lease <- resale_data$remaininglease_years

Floor Level

The floor level is provided in a range instead of individual levels. unique() of the base function is used to identify the unique level range.

length(unique(resale_data$storey_range))
[1] 17

There is 17 unique floor level range.

Locational Factors

Creating a simple feature data frame from an aspatial data frame

resale_data_sf <- st_as_sf(resale_data, 
                         coords = c("LONGITUDE", "LATITUDE"), crs=4326) %>%
  st_transform(crs = 3414)
resale_data_sf <- st_as_sf(resale_data, 
                         coords = c("LONGITUDE", "LATITUDE"), crs=4326) %>%
  st_transform(crs = 3414)
resale_geometry <- st_coordinates(resale_data_sf$geometry)
eldercare_geometry <- st_coordinates(eldercare_sf$geometry)
dist_resale_eldercare <- spDists(resale_geometry, eldercare_geometry, longlat=FALSE)

Computing proximity matrix

Create a function to get the distance between the HDB Flat and factors mentioned in the dataset using st_distance() from sf package and dist() is used to calculate the euclidean distance

proximity <- function(df_1, df_2, var) {
  
  df_1_geometry <- st_coordinates(df_1$geometry)
  df_2_geometry <- st_coordinates(df_2$geometry)
  df_2_geometry <- df_2_geometry[,c(1,2)]
  dist_matrix <- spDists(df_1_geometry, df_2_geometry, longlat=FALSE)
  dist_matrix_df <- data.frame(dist_matrix) 
  dist_matrix_min <- dist_matrix_df %>% rowwise() %>% mutate(Min = min(c_across(1:(ncol(dist_matrix_df)))))
  df_1[,var] <- dist_matrix_min$Min/1000
  
  return(df_1)
}
resale_data_sf <- proximity(resale_data_sf, eldercare_sf, "PROX_ELDERCARE")
resale_data_sf <- proximity(resale_data_sf, park_sf, "PROX_PARK")
resale_data_sf <- proximity(resale_data_sf, childcare_sf, "PROX_CHILDCARE")
resale_data_sf <- proximity(resale_data_sf, mrtlrt_sf, "PROX_MRTLRT")
resale_data_sf <- proximity(resale_data_sf, presch_sf, "PROX_PRESCH")
resale_data_sf <- proximity(resale_data_sf, supermarket_sf, "PROX_SUPERMARKET")

Creating a csv to keep the results above

write.csv(resale_data_sf, "data/aspatial/resale_final.csv", row.names = FALSE)

Exploratory Data Analysis

Importing Final Dataset

resale_data <- read_csv("data/aspatial/resale_final.csv")

EDA using statistical graphics

Plotting the distribution of resale_price by using appropriate Exploratory Data Analysis (EDA)

ggplot(data=resale_data, aes(x=`resale_price`)) +
  geom_histogram(bins=20, color="black", fill="light blue")

The figure above reveals a right skewed distribution. This means that more 4 room type units were transacted at relative lower prices.

Statistically, the skewed distribution can be normalised by using log transformation. mutate() of dplyr package is used too add a new variable called log_resale_price using a log transformation on the variable resale_price.

resale_data <- resale_data %>%
  mutate(`log_resale_price` = log(resale_price))

Plotting the distribution of log_resale_price

ggplot(data=resale_data, aes(x=`log_resale_price`)) +
  geom_histogram(bins=20, color="black", fill="light blue")